Our question involves disparity in the care for the elderly based on income and racial distrbutrions in California. We will be using spatial analysis to determine if there is a spatial pattern based on the dispersion (or lackthereof) of care facilities based on their bed size/capacity size.
We will be testing the hypothesis that facility capacity is correlated with income and race. There is also reason to believe that location matters: areas with neighboring low-income, POC populations will correspond with less bed sizes.
To test this, we will need to test if there is a spatial pattern, and then carry on to run some spatial regression models.
We will start with two basic datasets provided by Jen on the elderly/adult care facilities in CA.
Things are looking a little messy, so let’s clean up the dataset so we only keep the variables of interest.
adult.prep <- adult.res %>%
select(Type, Name, Administrator, Address, City, State, Zip, `County Name`, Capacity, `License First Date`, Status) %>%
filter(Type == "ADULT RESIDENTIAL" & Status == "LICENSED")
elderly.prep <- elderly.res %>%
select(Type, Name, Administrator, Address, City, State, Zip, `County Name`, Capacity, `License First Date`, Status) %>%
filter(Type == "RESIDENTIAL CARE ELDERLY" & Status == "LICENSED")
#Let's join the two
care.facilities <- adult.prep %>%
full_join(elderly.prep)
write_csv(care.facilities, "facilities.prepped.csv")
Now we can just load the prepped csv, and begin working on it. * First, we want to get combine the address, city, state, zip into one address * Then we can send that to google maps api to get the geo coordinates for mapping
care.facilities <- read.csv("facilities.prepped.csv")
care.facilities$Location <- paste0(care.facilities$Address, ", ", care.facilities$City, ", ", care.facilities$State, care.facilities$Zip)
geo <- geocode(location = care.facilities$Location, output="latlon", source="google")
care.facilities$Lat <- geo$lat
care.facilities$Long <- geo$lon
Adding in census data.
census_api_key(Sys.getenv("CENSUS_API_KEY"))
care.facilities <- read_csv("care.fac.geo.csv")
#get variables
census_var <- load_variables(2015, "acs5", cache = TRUE)
test <- get_acs(geography = "tract", variables = "B06012_001E", state="CA", geometry = TRUE)
#getting data based county
ca.income.county <- get_acs(geography = "county", variables = "B06011_001E", state = "CA", geometry = TRUE)
Mapping
library(leaflet)
library(stringr)
library(sf)
## Linking to GEOS 3.6.1, GDAL 2.2.0, proj.4 4.9.3
library(viridis)
## Loading required package: viridisLite
library(viridisLite)
care.facilities <- read_csv("care.fac.geo.csv")
## Parsed with column specification:
## cols(
## Type = col_character(),
## Name = col_character(),
## Administrator = col_character(),
## Address = col_character(),
## City = col_character(),
## State = col_character(),
## Zip = col_integer(),
## County.Name = col_character(),
## Capacity = col_integer(),
## License.First.Date = col_character(),
## Status = col_character(),
## Location = col_character(),
## Lat = col_double(),
## Long = col_double()
## )
#getting data on median income, census tract
ca.house.income <- get_acs(geography = "tract", variables = "B19013_001", state = "CA", geometry = TRUE)
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
ca.home.value <- get_acs(geography = "tract", variables = "B25077_001", state = "CA", geometry = TRUE)
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.
#setting color
pal <- colorNumeric(palette = "viridis",
domain = ca.house.income$estimate)
getColor <- function(care.facilities) {
sapply(care.facilities$Capacity, function(Capacity) {
if(Capacity <= 12) {
"red"
} else if(Capacity >= 13 & Capacity <= 24) {
"orange"
} else if(Capacity >= 25 & Capacity <=100) {
"yellow"
} else {
"green"
} } )
}
ca.map <- ca.house.income %>%
st_transform(crs = "+init=epsg:4326") %>%
leaflet(width = "100%") %>%
addTiles() %>%
addPolygons(stroke= FALSE,
smoothFactor = 0,
fillOpacity = 0.7,
color = ~ pal(estimate)) %>%
addCircles(data = care.facilities,
lat = ~Lat,
lng = ~Long,
label = ~as.character(Name),
weight = 2,
radius = 20,
stroke = TRUE,
fillOpacity = 0.8,
color = getColor(care.facilities)
) %>%
addLegend("bottomright",
pal = pal,
values = ~ estimate,
title = "Household Income",
labFormat = labelFormat(prefix = "$"),
opacity = 1) %>%
addLegend("bottomleft",
colors= c("Red", "Orange", "Yellow", "Green"),
labels = c("4-12 Beds", "13-24 Beds", "25-100 Beds", "101+ Beds"),
title = "Capacity Size")
## Warning in validateCoords(lng, lat, funcName): Data contains 433 rows with
## either missing or invalid lat/lon values and will be ignored
Okay so for the most part, I figured it out, no idea why there are some corrdinates in like Michigan. Time to aggregate data for regression analysis.
What I’m thinking is we can try to map all the points and see what it looks like first. Using circle radius perhaps? Checklist of things:
#save the map
library(htmlwidgets)
saveWidget(ca.map, "ca.map.html", selfcontained = FALSE)
ca.map
Stuff over time:
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
care.facilities$date <- mdy(care.facilities$License.First.Date)
care.facilities$year <- year(care.facilities$date)
levels <- c(-Inf, 12, 25, 100, Inf)
labels <- c("small", "medium", "large", "x-large")
care.facilities$size <- cut(care.facilities$Capacity, levels, labels)
#time to re-organize data for graphing
growth <- care.facilities %>%
select(size, year) %>%
group_by(year) %>%
count(size)
growth.plot <- ggplot(growth, aes(x=year, y=n, color=size)) +
geom_line()
growth.plot <- growth.plot + ggtitle("Growth of Care Facilities, CA") +
xlab("Year") +
ylab("Facilities Opened")
growth.plot
## Warning: Removed 1 rows containing missing values (geom_path).
Okay, so now we need to prepare data for spatial regression. We can accomplishment this by aggregating the data in a way that satisfies our questions: Is facility size influenced by income? Or maybe race?
To do this, let’s create a dummy variable that gives us a “1” if the facility is small, and a “0” if it is not. The amount of small facilities in a census blocked will be our dependent variable in our regression models.
care.facilities <- care.facilities %>%
mutate(dummy = ifelse(size == "small", 1, 0))
#create a new csv with just the variables we need
care.regression <- care.facilities %>%
select(Name, City, Lat, Long, dummy)
write.csv(care.regression, "care.regression.csv")
#playing around with the data
care.reg.agg <- care.regression %>%
select(City, dummy) %>%
group_by(City) %>%
summarise(small = sum(dummy == 1))
#It might be possible to group by city and do a proportion?
care.city <- care.facilities %>%
select(City, size) %>%
group_by(City) %>%
summarise(small = sum(size == "small"),
other = sum(size != "small"),
ratio = small/other)
care.city.measure <- care.facilities %>%
select(City, size) %>%
group_by(City) %>%
count(size)
ca.house.income <- get_acs(geography = "tract", variables = "B19013_001", state = "CA", geometry = TRUE)
## Downloading feature geometry from the Census website. To cache shapefiles for use in future sessions, set `options(tigris_use_cache = TRUE)`.